Skip to content

perf(orchestrator): add memfile dedup density threshold#2862

Open
ValentaTomas wants to merge 13 commits into
mainfrom
valenta/memfile-dedup-density-threshold
Open

perf(orchestrator): add memfile dedup density threshold#2862
ValentaTomas wants to merge 13 commits into
mainfrom
valenta/memfile-dedup-density-threshold

Conversation

@ValentaTomas
Copy link
Copy Markdown
Member

@ValentaTomas ValentaTomas commented May 29, 2026

Adds disabled-by-default controls for memfile dedup fetch fragmentation. When enabled, dedup can promote the cheapest parent-hit pages into the current diff to keep distinct non-empty backing fetch windows under a configured budget, without ever storing empty pages.

Config keys: maxFetchWindowsPerBlock, maxPromotedParentPagesPerBlock, and fetchRunWindowPages (all 0 = disabled/default window).

@cla-bot cla-bot Bot added the cla-signed label May 29, 2026
@cursor
Copy link
Copy Markdown

cursor Bot commented May 29, 2026

PR Summary

Medium Risk
Changes snapshot memfile diff shape and restore fetch patterns on the pause/export path; mitigated by disabled-by-default flag values and extensive tests, but mis-tuned budgets could inflate diffs or leave excess fragmentation.

Overview
Memfile diff dedup now supports an optional fetch-window budget: after comparing against the parent memfile, it can selectively promote parent-matching pages into the written diff so distinct backing fetch windows per block stay under configured limits, trading a slightly larger diff for fewer fragmented reads at restore. Dedup logic moves into a dedicated module with block-level classification (empty / parent / current) and a greedy fetch windower; telemetry gains promotion counters. DedupBudget is threaded through cache, memfd-async dedup, and snapshot export, populated from memfile-diff-dedup when dedup is enabled (maxFetchWindowsPerBlock, maxPromotedParentPagesPerBlock, fetchRunWindowPages; all zero keeps today’s behavior). GetShiftedMapping now rejects a nil header safely.

Reviewed by Cursor Bugbot for commit 146c007. Bugbot is set up for automated code reviews on this repo. Configure here.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 29, 2026

❌ 7 Tests Failed:

Tests completed Failed Passed Skipped
2752 7 2745 7
View the full list of 7 ❄️ flaky test(s)
github.com/e2b-dev/infra/tests/integration/internal/tests/api/sandboxes::TestSandboxListPaginationRunningLargerLimit

Flake rate in main: 42.64% (Passed 764 times, Failed 568 times)

Stack Traces | 97.2s run time
=== RUN   TestSandboxListPaginationRunningLargerLimit
    sandbox_list_test.go:327: Created sandbox 1/12: i6y20atbyztpw26ypowty
    sandbox_list_test.go:327: Created sandbox 2/12: iqbo4v4zf9cj558vs7u5j
    sandbox_list_test.go:327: Created sandbox 3/12: ike6s7bne3dsirlccqb6e
    sandbox_list_test.go:327: Created sandbox 4/12: iq9rz9vn8etbqcedh7wf2
    sandbox_list_test.go:327: Created sandbox 5/12: ikwwtoc6m80zxn7bt6tk1
    sandbox_list_test.go:327: Created sandbox 6/12: ipy800onsr3ogvxk2k0qq
    sandbox_list_test.go:327: Created sandbox 7/12: io2tw4ii52ccdhdyc5zvq
    sandbox_list_test.go:327: Created sandbox 8/12: it58nad5a6ihefa56l7wp
    sandbox_list_test.go:327: Created sandbox 9/12: iyzd5a5c6tl6csvmqepmi
    sandbox_list_test.go:327: Created sandbox 10/12: ipk7ylt5ftvweytx0a85n
    sandbox_list_test.go:327: Created sandbox 11/12: i7vnc3ez8u13ikt0hgsik
    sandbox_list_test.go:327: Created sandbox 12/12: i4a5hpsl0r2jpgp3230s6
    sandbox_list_test.go:330: 
        	Error Trace:	.../api/sandboxes/sandbox_list_test.go:340
        	            				.../hostedtoolcache/go/1.26.3.../src/runtime/asm_amd64.s:1771
        	Error:      	"[]" should have 12 item(s), but has 0
    sandbox_list_test.go:330: 
        	Error Trace:	.../api/sandboxes/sandbox_list_test.go:330
        	Error:      	Condition never satisfied
        	Test:       	TestSandboxListPaginationRunningLargerLimit
--- FAIL: TestSandboxListPaginationRunningLargerLimit (97.20s)
github.com/e2b-dev/infra/tests/integration/internal/tests/api/sandboxes::TestSnapshotTemplateConcurrentOperations

Flake rate in main: 41.54% (Passed 753 times, Failed 535 times)

Stack Traces | 0s run time
=== RUN   TestSnapshotTemplateConcurrentOperations
=== PAUSE TestSnapshotTemplateConcurrentOperations
=== CONT  TestSnapshotTemplateConcurrentOperations
--- FAIL: TestSnapshotTemplateConcurrentOperations (0.00s)
github.com/e2b-dev/infra/tests/integration/internal/tests/api/sandboxes::TestSnapshotTemplateConcurrentOperations/pause_during_snapshot_creation

Flake rate in main: 41.54% (Passed 753 times, Failed 535 times)

Stack Traces | 36.9s run time
=== RUN   TestSnapshotTemplateConcurrentOperations/pause_during_snapshot_creation
=== PAUSE TestSnapshotTemplateConcurrentOperations/pause_during_snapshot_creation
=== CONT  TestSnapshotTemplateConcurrentOperations/pause_during_snapshot_creation
    snapshot_template_test.go:398: snapshot response: 201
    snapshot_template_test.go:424: 
        	Error Trace:	.../api/sandboxes/snapshot_template_test.go:424
        	Error:      	Not equal: 
        	            	expected: 204
        	            	actual  : 409
        	Test:       	TestSnapshotTemplateConcurrentOperations/pause_during_snapshot_creation
        	Messages:   	pause during snapshotting should wait and succeed, body: {"code":409,"message":"Error pausing sandbox - sandbox 'ii50rup7nzu8avzy4ly8y' is already paused"}
--- FAIL: TestSnapshotTemplateConcurrentOperations/pause_during_snapshot_creation (36.91s)
github.com/e2b-dev/infra/tests/integration/internal/tests/orchestrator::TestSandboxMemoryIntegrity

Flake rate in main: 57.45% (Passed 757 times, Failed 1022 times)

Stack Traces | 66.8s run time
=== RUN   TestSandboxMemoryIntegrity
=== PAUSE TestSandboxMemoryIntegrity
=== CONT  TestSandboxMemoryIntegrity
    sandbox_memory_integrity_test.go:27: Build completed successfully
--- FAIL: TestSandboxMemoryIntegrity (66.84s)
github.com/e2b-dev/infra/tests/integration/internal/tests/orchestrator::TestSandboxMemoryIntegrity/tmpfs_hash

Flake rate in main: 57.56% (Passed 747 times, Failed 1013 times)

Stack Traces | 205s run time
=== RUN   TestSandboxMemoryIntegrity/tmpfs_hash
=== PAUSE TestSandboxMemoryIntegrity/tmpfs_hash
=== CONT  TestSandboxMemoryIntegrity/tmpfs_hash
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{start:{pid:1275}}
Executing command bash in sandbox ingr61jdqz4zlznmn7bko (user: root)
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stdout:"Total memory: 985 MB\nUsed memory before tmpfs mount: 191 MB\nFree memory before tmpfs mount: 793 MB\nMemory to use in integrity test (60% of free, min 64MB): 475 MB\n"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"475+0 records in\n475+0 records out\n498073600 bytes (498 MB, 475 MiB) copied, 3.5927 s, 139 MB/s\n"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"\t"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"Comm"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"and bein"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"g time"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"d: \""}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"dd if=/dev/urandom "}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"of=/mnt/testfile bs=1M count=475\"\n"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"\tUser ti"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"me (se"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"conds)"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:": "}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"0.00"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"\n\tSystem time (seconds): 3.53\n\t"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"Percent of CPU this job got: 98%\n\tElapsed (wall clock) time (h:mm:ss or m:ss): 0:03.59\n\tAverage shared text size (kbytes): 0\n\tAverage unshared data size (kby"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"tes): "}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"0\n\tAver"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"age st"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"ack size ("}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"kbytes)"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:": 0\n\tA"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"verage"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:" total"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:" size (kby"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"tes): "}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"0\n\tMaxi"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"mum re"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"sident"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:" set siz"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"e (kbytes): 2648\n\tAverage resident set "}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"size (kbyte"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"s): 0\n"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"\tMajor (re"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"q"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"uiring I/O) page faults: 2\n\tMinor (reclaiming a frame) page faults: 341\n\tVoluntary context switches: 3\n\tInvoluntary context switches: 24\n\tSwaps: 0"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"\n\tFile s"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"ystem "}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"inputs: 176"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"\n\tFile system outputs: 0\n\tSocket "}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"messages s"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"ent: 0"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"\n\tSocke"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"t messages"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:" re"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"ceived: "}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"0\n\tSignals"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:" delivered"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:": 0\n\tPa"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"ge siz"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"e (byt"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"es): "}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"4096\n\tExit status: 0\n"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stdout:"Used memory after tmpfs mount and file fill: 670 MB\n"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{end:{exited:true  status:"exit status 0"}}
    sandbox_memory_integrity_test.go:70: Command [bash] completed successfully in sandbox in6uj4065zyipbwnnghic
Executing command bash in sandbox in6uj4065zyipbwnnghic (user: root)
    sandbox_memory_integrity_test.go:80: Command [bash] output: event:{start:{pid:1292}}
    sandbox_memory_integrity_test.go:80: Command [bash] output: event:{data:{stdout:"5ed1d27d8076153c2da5f7c6a60ef2ef4a17f3231b7925683cfeb5238fb79c13\n"}}
    sandbox_memory_integrity_test.go:80: Command [bash] output: event:{end:{exited:true  status:"exit status 0"}}
    sandbox_memory_integrity_test.go:80: Command [bash] completed successfully in sandbox in6uj4065zyipbwnnghic
Executing command bash in sandbox i7gg1psun20mgs4co4euh (user: root)
    sandbox_memory_integrity_test.go:80: Command [bash] output: event:{start:{pid:1295}}
Executing command bash in sandbox in6uj4065zyipbwnnghic (user: root)
Executing command bash in sandbox in6uj4065zyipbwnnghic (user: root)
Executing command bash in sandbox in6uj4065zyipbwnnghic (user: root)
Executing command bash in sandbox in6uj4065zyipbwnnghic (user: root)
Executing command bash in sandbox in6uj4065zyipbwnnghic (user: root)
Executing command bash in sandbox in6uj4065zyipbwnnghic (user: root)
Executing command bash in sandbox in6uj4065zyipbwnnghic (user: root)
Executing command bash in sandbox in6uj4065zyipbwnnghic (user: root)
Executing command bash in sandbox in6uj4065zyipbwnnghic (user: root)
Executing command bash in sandbox in6uj4065zyipbwnnghic (user: root)
Executing command bash in sandbox in6uj4065zyipbwnnghic (user: root)
Executing command bash in sandbox in6uj4065zyipbwnnghic (user: root)
Executing command bash in sandbox in6uj4065zyipbwnnghic (user: root)
Executing command bash in sandbox in6uj4065zyipbwnnghic (user: root)
Executing command bash in sandbox in6uj4065zyipbwnnghic (user: root)
Executing command bash in sandbox in6uj4065zyipbwnnghic (user: root)
Executing command bash in sandbox in6uj4065zyipbwnnghic (user: root)
Executing command bash in sandbox in6uj4065zyipbwnnghic (user: root)
Executing command bash in sandbox in6uj4065zyipbwnnghic (user: root)
Executing command bash in sandbox in6uj4065zyipbwnnghic (user: root)
Executing command bash in sandbox in6uj4065zyipbwnnghic (user: root)
Executing command bash in sandbox in6uj4065zyipbwnnghic (user: root)
Executing command bash in sandbox in6uj4065zyipbwnnghic (user: root)
Executing command bash in sandbox in6uj4065zyipbwnnghic (user: root)
Executing command bash in sandbox in6uj4065zyipbwnnghic (user: root)
Executing command bash in sandbox in6uj4065zyipbwnnghic (user: root)
Executing command bash in sandbox in6uj4065zyipbwnnghic (user: root)
Executing command bash in sandbox in6uj4065zyipbwnnghic (user: root)
Executing command bash in sandbox in6uj4065zyipbwnnghic (user: root)
Executing command bash in sandbox in6uj4065zyipbwnnghic (user: root)
Executing command bash in sandbox in6uj4065zyipbwnnghic (user: root)
Executing command bash in sandbox in6uj4065zyipbwnnghic (user: root)
Executing command bash in sandbox in6uj4065zyipbwnnghic (user: root)
Executing command bash in sandbox in6uj4065zyipbwnnghic (user: root)
Executing command bash in sandbox in6uj4065zyipbwnnghic (user: root)
Executing command bash in sandbox in6uj4065zyipbwnnghic (user: root)
Executing command bash in sandbox in6uj4065zyipbwnnghic (user: root)
Executing command bash in sandbox in6uj4065zyipbwnnghic (user: root)
Executing command bash in sandbox in6uj4065zyipbwnnghic (user: root)
Executing command bash in sandbox in6uj4065zyipbwnnghic (user: root)
Executing command bash in sandbox in6uj4065zyipbwnnghic (user: root)
Executing command bash in sandbox in6uj4065zyipbwnnghic (user: root)
Executing command bash in sandbox in6uj4065zyipbwnnghic (user: root)
Executing command bash in sandbox in6uj4065zyipbwnnghic (user: root)
Executing command bash in sandbox in6uj4065zyipbwnnghic (user: root)
Executing command bash in sandbox in6uj4065zyipbwnnghic (user: root)
Executing command bash in sandbox in6uj4065zyipbwnnghic (user: root)
Executing command bash in sandbox in6uj4065zyipbwnnghic (user: root)
Executing command bash in sandbox in6uj4065zyipbwnnghic (user: root)
Executing command bash in sandbox in6uj4065zyipbwnnghic (user: root)
Executing command bash in sandbox in6uj4065zyipbwnnghic (user: root)
Executing command bash in sandbox in6uj4065zyipbwnnghic (user: root)
Executing command bash in sandbox in6uj4065zyipbwnnghic (user: root)
Executing command bash in sandbox in6uj4065zyipbwnnghic (user: root)
Executing command bash in sandbox in6uj4065zyipbwnnghic (user: root)
Executing command bash in sandbox in6uj4065zyipbwnnghic (user: root)
Executing command bash in sandbox in6uj4065zyipbwnnghic (user: root)
Executing command bash in sandbox in6uj4065zyipbwnnghic (user: root)
Executing command bash in sandbox in6uj4065zyipbwnnghic (user: root)
Executing command bash in sandbox in6uj4065zyipbwnnghic (user: root)
Executing command bash in sandbox in6uj4065zyipbwnnghic (user: root)
Executing command bash in sandbox in6uj4065zyipbwnnghic (user: root)
Executing command bash in sandbox in6uj4065zyipbwnnghic (user: root)
Executing command bash in sandbox in6uj4065zyipbwnnghic (user: root)
Executing command bash in sandbox in6uj4065zyipbwnnghic (user: root)
Executing command bash in sandbox in6uj4065zyipbwnnghic (user: root)
Executing command bash in sandbox in6uj4065zyipbwnnghic (user: root)
Executing command bash in sandbox in6uj4065zyipbwnnghic (user: root)
Executing command bash in sandbox in6uj4065zyipbwnnghic (user: root)
Executing command bash in sandbox in6uj4065zyipbwnnghic (user: root)
Executing command bash in sandbox in6uj4065zyipbwnnghic (user: root)
Executing command bash in sandbox in6uj4065zyipbwnnghic (user: root)
Executing command bash in sandbox in6uj4065zyipbwnnghic (user: root)
Executing command bash in sandbox in6uj4065zyipbwnnghic (user: root)
Executing command bash in sandbox in6uj4065zyipbwnnghic (user: root)
Executing command bash in sandbox in6uj4065zyipbwnnghic (user: root)
Executing command bash in sandbox in6uj4065zyipbwnnghic (user: root)
Executing command bash in sandbox in6uj4065zyipbwnnghic (user: root)
Executing command bash in sandbox in6uj4065zyipbwnnghic (user: root)
Executing command bash in sandbox in6uj4065zyipbwnnghic (user: root)
Executing command bash in sandbox in6uj4065zyipbwnnghic (user: root)
Executing command bash in sandbox in6uj4065zyipbwnnghic (user: root)
Executing command bash in sandbox in6uj4065zyipbwnnghic (user: root)
    sandbox_memory_integrity_test.go:110: 
        	Error Trace:	.../tests/orchestrator/sandbox_memory_integrity_test.go:81
        	            				.../hostedtoolcache/go/1.26.3.../src/runtime/asm_amd64.s:1771
        	Error:      	Received unexpected error:
        	            	failed to execute command bash in sandbox in6uj4065zyipbwnnghic: unavailable: HTTP status 502 Bad Gateway
    sandbox_memory_integrity_test.go:110: 
        	Error Trace:	.../tests/orchestrator/sandbox_memory_integrity_test.go:78
        	            				.../tests/orchestrator/sandbox_memory_integrity_test.go:110
        	Error:      	Condition never satisfied
        	Test:       	TestSandboxMemoryIntegrity/tmpfs_hash
--- FAIL: TestSandboxMemoryIntegrity/tmpfs_hash (204.81s)
github.com/e2b-dev/infra/tests/integration/internal/tests/proxies::TestEnvdAccessTokenAutoResumeViaProxy

Flake rate in main: 42.78% (Passed 753 times, Failed 563 times)

Stack Traces | 10.8s run time
=== RUN   TestEnvdAccessTokenAutoResumeViaProxy
=== PAUSE TestEnvdAccessTokenAutoResumeViaProxy
=== CONT  TestEnvdAccessTokenAutoResumeViaProxy
    traffic_access_token_test.go:357: 
        	Error Trace:	.../tests/proxies/traffic_access_token_test.go:357
        	Error:      	Received unexpected error:
        	            	Get "http://localhost:3002/health": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
        	Test:       	TestEnvdAccessTokenAutoResumeViaProxy
--- FAIL: TestEnvdAccessTokenAutoResumeViaProxy (10.81s)
github.com/e2b-dev/infra/tests/integration/internal/tests/proxies::TestSandboxAutoResumeViaProxy

Flake rate in main: 43.28% (Passed 751 times, Failed 573 times)

Stack Traces | 13.2s run time
=== RUN   TestSandboxAutoResumeViaProxy
=== PAUSE TestSandboxAutoResumeViaProxy
=== CONT  TestSandboxAutoResumeViaProxy
    auto_resume_test.go:116: 
        	Error Trace:	.../tests/proxies/auto_resume_test.go:116
        	Error:      	Received unexpected error:
        	            	Get "http://localhost:3002": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
        	Test:       	TestSandboxAutoResumeViaProxy
--- FAIL: TestSandboxAutoResumeViaProxy (13.19s)

To view more test analytics, go to the Test Analytics Dashboard
📋 Got 3 mins? Take this short survey to help us improve Test Analytics.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

There are no critical findings or correctness issues identified in this pull request.

@ValentaTomas ValentaTomas marked this pull request as ready for review May 29, 2026 22:06
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6d2a1dec8f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread packages/orchestrator/pkg/sandbox/block/cache.go Outdated
@ValentaTomas ValentaTomas force-pushed the valenta/memfile-dedup-density-threshold branch from 6d2a1de to dc32af5 Compare May 29, 2026 23:13
@ValentaTomas ValentaTomas force-pushed the valenta/memfile-dedup-density-threshold branch from dc32af5 to 45404d1 Compare May 29, 2026 23:47
Copy link
Copy Markdown
Contributor

@levb levb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(still reviewing/understanding)

This approach scans in 1 direction and "locally" - the decision on whether to do a parent frame fetch (or store the page locally) is made solely on a single child's relationship to it. If there are parent frames with many children references, they might still be tossed out by the single-child based criteria.

For now, I am asking to move deduplication into its own file, with a more apparent hook in chunk.go for deduplication. This would make it a little easier to compare strategies if a different one appears in the future.

Comment thread packages/orchestrator/pkg/sandbox/block/cache.go Outdated
Comment thread packages/orchestrator/pkg/sandbox/block/cache.go Outdated
Copy link
Copy Markdown
Contributor

@dobrac dobrac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code flow is really complex. I would prefer simplifying (not a blocker though).

Please add more unit tests for the added functionality.

Comment thread packages/orchestrator/pkg/sandbox/fc/memory.go Outdated
Comment thread packages/orchestrator/pkg/sandbox/block/cache.go Outdated
@ValentaTomas ValentaTomas force-pushed the valenta/memfile-dedup-density-threshold branch 2 times, most recently from eb72ac0 to 59c2c0d Compare May 30, 2026 23:37
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 59c2c0d958

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread packages/orchestrator/pkg/sandbox/block/cache.go Outdated
@ValentaTomas ValentaTomas force-pushed the valenta/memfile-dedup-density-threshold branch from 59c2c0d to e4f45c3 Compare May 31, 2026 00:03
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e4f45c3798

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread packages/orchestrator/pkg/sandbox/block/cache.go Outdated
@ValentaTomas ValentaTomas requested review from dobrac and levb May 31, 2026 05:01
dobrac and others added 7 commits May 31, 2026 00:52
Give the dedup page-kind and fetch-source enums named byte types instead
of bare byte, and rename dedupFetchKey fields source/build to
sourceType/buildID for clarity. No behavior change.
Return an error instead of panicking when called on a nil *Header, so
callers can treat a missing header as 'no mapping' without a separate
nil check.
Break the monolithic dedupCompare (cyclomatic complexity 22) into a
read-only dedupConfig with per-block compareBlock and per-page
classifyPage helpers, plus an in-place dedupAccum accumulator. Extract
the fetch-window promotion heuristic onto a fetchWindower receiver,
collapsing the duplicated benefit/cost argmax into bestByRatio and the
run/key candidate scanning into the parentRuns and parentKeyGroups
iterators. parentKeyGroups now yields groups ordered by first page index
so promotion selection is deterministic regardless of map order.

classifyPage drops its baseHeader nil check now that GetShiftedMapping
is nil-safe. No behavior change.
Add unit tests for the extracted dedup helpers: parentRuns segmentation
(empties bridge runs, current pages and ends bound them), parentKeyGroups
deterministic ordering, fetchWindower count/bestParentRun including the
key-group fallback, and compareBlock in-place accumulation.
Verify that when the promotion budget is zero, parent pages matching the
base stay deduped even if the block exceeds MaxFetchWindowsPerBlock,
rather than being over-promoted into the diff.
Add direct unit tests for recordBlockPages (bitmap routing, absOff page
index offset, stored-page count) and fetchWindower.compact (budget
guards, no-op when within window budget, parent-run promotion).
Add direct unit tests for classifyPage (empty/parent/current routing,
mapping vs offset fetch-window keying, best-effort uncached skip) and the
fetchWindower countAfter/bestByRatio budget and benefit gating.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9d5eb29694

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread packages/orchestrator/pkg/sandbox/block/cache.go Outdated
Extract the dedup classification, fetch-window promotion, and drain logic out
of the generic block cache into dedup.go (same package, no behavior change),
leaving Cache.Dedup as the hook so the strategy is contained and easier to
swap or compare later.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants